3574 results found.
Written
Terminology,
Language Type:
Multilingual
Languages:
Arabic Dutch English French German Modern Greek Russian Spanish
Availability:
Freely Available
License:
Size:
4473 concepts Production Status:
Existing-updated
Use:
Acquisition
-
Paper title:Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based Study
-
Paper track:Terminology/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Pilar León-Araúz | EcoLexicon | /N |
Documentation:
https://ecolexicon.ugr.es/en/manual.htm
Embeddings and validation dkctionaries,
Language Type:
Multilingual
Languages:
Chinese English Japanese Turkish
Availability:
License:
Size:
2.8 GB Production Status:
Use:
Embeddings for reproduction results of unsupervised machine translation
-
Paper title:A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
-
Paper track:Evaluation/poster presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Kamil Pluciński | Datasets used in attached paper | /N |
Documentation:
None
Mixed extension files
Text simplification System- including new dataset,
Language Type:
Monolingual
Languages:
English
Availability:
Public
License:
OpenSource
Size:
27.3Mbyte OtherProduction Status:
Use:
Text Simplification
-
Paper title:CombiNMT: An Exploration into Neural Text Simplification Models
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Michael Cooper | CombiNMT | /N |
Documentation:
OpenNMT documentation
Written
Corpus,
Language Type:
Bilingual
Languages:
English Italian
Availability:
From Owner
License:
Size:
318725 words Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Pasquale Capuozzo | The DecOp corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali English Gujarati Hindi Malayalam Marathi Oriya Punjabi Tamil Telugu Urdu
Availability:
Freely Available
License:
Free
Size:
500MB MByte Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:A Multilingual Parallel Corpora Collection Effort for Indian Languages
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jerin Philip | CVIT Multilingual Parallel Corpus for Indian Languages | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Croatian English Finnish Slovenian
Availability:
Freely Available
License:
TBD
Size:
1554 entries Production Status:
Newly created-in progress
Use:
Evaluation/Validation
-
Paper title:CoSimLex: A Resource for Evaluating Graded Word Similarity in Context
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Carlos Santos Armendariz | CoSimLex | /N |
Documentation:
https://competitions.codalab.org/competitions/20905
Written
Corpus,
Language Type:
Bilingual
Languages:
English Hindi
Availability:
From Owner
License:
MIT
Size:
400000 sentences Production Status:
Newly created-finished
Use:
Language Modelling
-
Paper title:Minority Positive Sampling for Switching Points - an Anecdote for the Code-Mixing Language Modeling
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Amitava Das | Code Mixed Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
MIT License
Size:
104,990,418 tokens Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Don Tuggener | LEDGAR | /N |
Documentation:
The corpus is described in the accompanying paper submission
Written
Corpus,
Language Type:
Bilingual
Languages:
English French
Availability:
Freely Available
License:
CC
Size:
342 problems OtherProduction Status:
Newly created-finished
Use:
Textual Entailment and Paraphrasing
-
Paper title:A French Version of the FraCaS Test Suite
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Maxime Amblard | French FraCas | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International
Size:
1511 entries Production Status:
Newly created-in progress
Use:
Opinion Mining/Sentiment Analysis
-
Paper title:Do You Really Want to Hurt Me? Predicting Abusive Swearing in Social Media
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Endang Pamungkas | Swear Word Abusiveness Dataset (SWAD) | /N |
Documentation:
None




